A Lower Bound on the Complexity of 
Approximating the Entropy of a Markov Source 
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The Asymptotic Equipartition Property (see, e.g., [3]) implies that, if we choose the characters 
of a string s of length n independently and according to the same probability distribution P over 
the alphabet then, for large values of n, the Oth-order empirical entropy Hq(s) of s (see, e.g., [3]) 
will almost certainly be close to the entropy H(P) of P. Batu, Dasgupta, Kumar and Rubinfeld [1] 
showed that, if H(P) = Q (7/e), then we can almost certainly approximate H(P) to within a factor 
of 7 after seeing O ^o~( 1+€ ) /~y 2 \ g (jj characters of s, where a is the alphabet size and e is any positive 

constant; they proved a lower bound of I? ^o" 1 ^ 272 ^ , which was later improved by Raskhodnikova, 
Ron, Shpilka and Smith [5] and Valiant [6]. 

Similarly, the Shannon-McMillan-Breiman Theorem (see, e.g., [3] again) implies that, if we 
generate s from a stationary ergodic fcth-order Markov source X then, for large values of n, the 
/cth-order empirical entropy Hk{s) of s (see, e.g., j4] again) will almost certainly be close to the 
entropy H(X) of X . Although many papers have been written about approximating the entropy of 
a Markov source based on a sample (see, e.g., [2] and references therein), we know of no upper or 
lower bounds similar to Batu et al.'s results. We now give a simple proof that, even if we know X 
has entropy either or at least log(<7 — k), there is still no algorithm that, with probability bounded 
away from 1/2, guesses its entropy correctly after seeing at most (<r — k) k / 2 ~ e characters. 

Lemma 1. For any k > 1, e > and sufficiently large a, there is a kth-order Markov source over 
the alphabet {0, . . . , a — 1} that has entropy at least log(cr — k) but, with high probability, does not 
emit duplicate k-tuples among its first (a — k) k ^ 2 ~ e characters. 

Proof. Consider the /cth-order Markov source that, whenever it has emitted a /c-tuple a = a-y, . . . , a^, 
emits a character drawn uniformly at random from {0, . . . , a — 1} — {01, . . . , «£;}. Notice this source 
has entropy at least log(<7 — k). Also, a /c-tuple a cannot occur in position i if it occurs in any of 
the positions i — k + 1, — 1, i + 1, . . . ,i + k — 1, and vice versa. Finally, the probability a occurs 
in position i is independent of whether it occurs in position j for j < i — k or j > i + k. 

For i — k-\-l<j<i + k— 1, let the indicator variable Bj be 1 if a occurs in position j, and 
otherwise. By Bayes' Rule, the probability a occurs in position i, given that it does not occur in 
any of the positions i — k + 1, — 1, i + 1, ...,£ + k — 1, is 
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It follows that the probability a occurs at least twice among the first (a — k) k / 2 ~ t emitted characters 
is at most the probability that, while drawing (a — k) k / 2 ~ e elements uniformly at random and with 
replacement from a set of size (a — k) k , we draw a specified element at least twice. Therefore, the 
probability any fe-tuple occurs at least twice among the first (a — k) k / 2 ~ e emitted characters is at 
most the probability that we draw any element at least twice. For k > 1 and sufficiently large a, 
both probabilities are negligible. □ 

Theorem 1. Suppose that, for any k > 1, e > and sufficiently large a, we are given a black box 
that allows us to sample characters from a kth-order Markov source over the alphabet {0, . . . , a — 1}. 
Even if we know the source has entropy either or at least log(<r — k), there is still no algorithm 
that, with probability bounded away from 1/2, guesses the entropy correctly after sampling at most 
(a — k) k / 2 ~ e characters. 

Proof. Consider any algorithm A for guessing the source's entropy. Suppose there is a string s of 
length (cr — k) k / 2 ~ e containing no duplicate fe-tuples and such that, with probability at least 1/2, 
A stops and guesses "at least log(<7 — k)" after sampling a prefix of s. Then on any source with 
entropy that starts by emitting s with probability 1 the algorithm errs with probability at least 
1/2. Given s, it is straightforward to build such a source. 

Now suppose there is no such string s. Then whenever the first (a — k) k / 2 ~ e sampled characters 
contain no duplicate /c-tuples, A either samples more characters or stops and guesses "0", with 
probability at least 1/2. Therefore, on any source with entropy at least log(<r — k) that, with high 
probability, does not emit duplicate fc-tuples among its first (cr — k) k / 2 ~ e characters — such as the 
one described in the lemma above — A either samples more characters or errs, with probability 
nearly 1/2. □ 
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